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Abstract 


This report presents a method for learning a control strategy using the hidden Markov model 
(HMM), i.e., developing a feedback controller based on HMMs. The HMM is a parametric 
model for non-stationeiry pattern recognition and is feasible to characterize a doubly stochas¬ 
tic process involving observable actions and a hidden decision pattern. The control strategy 
is encoded by HMMs through a training process. The trained models are then employed to 
control the system. The proposed method has been investigated by simulations of a linear 
system and an inverted pendulum system. The HMM-based controller provides a novel way 
to learn control strategy and to model the human decision making process. 
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1 Introduction 


An intelligent controller has the ability to comprehend, reason, and learn about processes 
and environment. Because an intelligent control system is complex, analytic method in 
control theory is insufficient and inefficient for analyzing and designing an intelligent control 
system. Various methods have been proposed for designing intelligent control systems such 
as pattern recognition method [1, 2], fuzzy control [3, 4], and neurocontrol [5, 6, 7]. 

A controller maps its input onto an appropriate set of control actions. The correspondence 
between pattern recognition and control can be considered as the learned response of the 
control system to known patterns in the input data. The p">wer of fuzzy sets and neural 
networks lies in their ability to represent the mapping of a controller. It has been proved that 
any continuous nonlinear mapping can be approximated as exactly as needed with a finite 
set of fuzzy variables, values, and rules [8]. In a typical neural network learning application, 
the desired mapping is static. The hidden assumption is that the nonlinear static map 
generated by the neural network can adequately represent the system’s behavior [9]. Some 
mappings in the control systems, however, are non-stationary. Human performance is an 
example. Human performance is the actions and/or reactions of humans under specified 
circumstances. Actions reflect human skill of performing a certain task and reactions reflect 
the control strategy to environment. A human associates responses with stimuli, actions with 
scenarios, labels with patterns, and effects with causes. Because both human decision and 
sensory processes are stochastic, human control actions differ even when inputs are the same. 
In short, the human control strategy is non-linear, non-deterministic, and non-stationary, 
and it is necessary to model such a mapping with an appropriate tool. 

The hidden Markov model (HMM) is a powerful tool for pattern recognition of the non- 
stationary stochastic process [10]. HMM is a doubly stochastic process: the hidden under¬ 
lying stochastic process can only be observed through another set of stochastic processes. 
In addition, the HMM is a parametric model that can be optimized with efficient algo¬ 
rithms. HMMs has been successfully used in speech recognition [10, 11, 12, 13). Recently 
their effectiveness has been studied in force analysis, task-context final segmentation, and 
extraction of discrete finite-state Markov signals [14, 15, 16]. We have successfully applied 
HMMs to human action modeling [17, 18]. Action and reaction are two important aspects 
of human performance. Both of them play important roles in human performance. For 
example, playing tennis is a complicated process involving actions and reactions. A tennis 
player determines his strategy based on his experience and the immediate information, such 
as ball’s incoming speed and height. The player’s actions will directly influence his reac¬ 
tions; hence, high-quality performance requires both good reactions and good actions. In 
this report, we discuss the problem of reaction modeling and application of HMMs to control 
strategy learning. 

The objective of a learning controller is to acquire a control strategy from experience to 
achieve certain desired goals. In general, an HMM-based controller can be regarded as a 
means of learning a control strategy from a teacher. The control strategy modeled with 
HMMs is used for controlling the system. For a given system, the control strategy is par¬ 
titioned into certain decision patterns, and these patterns are described by corresponding 
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HMMs. During the training process, HMMs learn a control strategy by adjusting their pa¬ 
rameters. During controlling the system, at each ‘‘sampling time,” the controller evaluates 
the feedback signal using the trained HMMs, and their scores (probabilities) are used to 
generate the controller output, i.e., the patterns with higher probabilities contribute more 
to the controller output. 

2 Hidden Markov Modeling 

2.1 Hidden Markov Model 

A hidden Markov model is a collection of finite states connected by transitions. Each state 
is characterized by two sets of probabilities: a transition probability, and either a discrete 
output probability distribution or a continuous output probability density function which, 
given the state, defines the condition probability of emitting each output symbol from a 
finite alphabet or a continuous random vector. 

An HMM can be defined by: 

• A set of states {S}, with an initial state Si and a final state Sf 

• The transition probability matrix, A = {a^}, where a,j is the transition probability of 
taking the transition from state i to state j 

• The output probability matrix B. For a discrete HMM, B = {6>(Oi)}, where Ok 
represents a discrete observation symbol. For a continuous HMM, B = {&j(x)}, where 
X represents continuous observations of A:-dimensional random vectors 

In this report, we consider only a discrete HMM. For a discrete HMM, a,j and bj{Ok) have 
the following properties: 

<■(>>0, 6j(O.)>0, (1) 

= ( 2 ) 

3 

vj. (3) 

k 

If the initial state distribution tt = {xj}, the complete parameter set of the HMM can be 
expressed compactly as 

A = (A,H,7r). (4) 

For a detailed description of the theory and computing HMM, the readers are referred to 

[10, 12]. 
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2.2 Concept and Approach 


Pattern recognition can be used for classifying objects or processes of unknown origin into 
predetermined classes. The problem of pattern recognition is usually characterized by a 
description and classification of a set of objects or processes. Many control problems can 
be described as pattern recognition problems. For example, in an on-off room-temperature 
control system, the task of a controller is to maintsun the room temperature based on two 
patterns. However, controller usually have to identify infinite number of patterns and their 
outputs are continuous. Therefore, more complex techniques are required for controlling 
such systems based on a pattern recognition approach. 

In a feedback system, the controller input is the error signal and the output is the con¬ 
trol signal. The feedback control strategy, in general, is a function of the error signal, its 
history, and the control signal history. Usually, the feedback and control signals are multi¬ 
dimensional. If the control strategy depends only on the finite history of the feedback signal, 
the critical issue is to determine decision patterns, i.e., the number of HMMs for character¬ 
izing the decision patterns, because of no fixed pattern available for a controller. In order 
to obtain finite decision patterns, we can partition control space into finite patterns, model 
these patterns, and then generate control signals based on these models. We have developed 
the following technique to implement an HMM-based controller. 

1. Record the controller input and output data, which are the feedback signals e{i) and 
control signals n(i). These data will be used for training HMMs, Assuming u(i) 
depends only on k samplings of e(i), the correspondence between u(i) and the feedback 
signals can be established. 

2. Partition control patterns and quantize the feedback signals into finite levels. 

(a) Partition [Umin, Umax] into N patterns: 

C/ = {Ux,U2,...,Un}. (5) 

(b) At each time i = 1,2,..., M, u(i) belongs to one of the patterns, Uj, and corre¬ 
sponds to a set of sequences {£?(i)}, where 

{F;(i)} = {e(*),e(i - l),e(t-2),...,e(* - A:)}, A: > 0. (6) 

3. Handmark each {-£(*)} f’O corresponding one of {Uj}, where i = 1,2, • • •, M and j = 
i,2,---,Ar. 

4. Use HMM to describe each f/j, j = 1,2, • • •, and train the models by the data. 

5. Put the trained models into the model bank for controlling the system. At each 

sampling time, is scored by all trained models for obtaining the probabilities of 

decision patterns Uj, j = 1,2,,.., iV, matdiing with 

P{U,),P{U2),...,PiUN). (7) 
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Then P{Uj), j = 1,2,..., are sorted in a decreasing order and the first m P{U^) 
are employed as weights to compute the controller output: 


Emi) ’ 

i=i 


( 8 ) 


where Uj, j = 1,2,...,m, stand for the sorted patterns and u' stands for the corre¬ 
sponding control signal values. 


In next section, we discuss how to develop an HMM-based controller in detail. 


3 HMM-Based Controller 

Figure 1 shows the basic configuration of an HMM-based controller which consists of four 
comp>onents: signal preprocessor, pattern evaluation unit, model bank, and control signal 
generator. The signal preprocessor measures the values of feedback signals, maps the range 
of values of measured signals onto corresponding universes of discourse, and converts the 
input data into suitable symbols which will be used by HMMs. The pattern evaluation 
unit estimates the probabilities that the input signals match with the models in the model 
bank. The model bank is the kernel of an HMM-based controller; it contains the trained 
HMMs vrhich represent the most likely decision patterns of controller. The models in the 
model bank are trained by training examples. The control signal generator generates control 
signals based on the pattern evaluations, and converts the range of values of the control 
signals into corresponding universes of discourse. 


3.1 Preprocessor 

The measured feedback signals are preprocessed for appropriate enhancement. First they 
are filtered to eliminate noise. Then the resulting sample is converted into finite symbols 
because we use discrete HMM for describing decision patterns. In a multi-input system, the 
feedback signals are sequential vectors. Vector quantization (VQ) technique [19] is a suitable 
tool to map real vectors onto finite symbols. A vector quantizer is completely decided by 
a codebook which consists of a set of the fixed prototype vectors. A d^cription of the VQ 
process includes: (1) the distortion measure, and (2) the generation of the certain number 
of prototype vectors. In the implementation, the squared error distortion measure is used, 
i.e., 

d(x, x) = ||x, xll = X^(xi - x,-)^ (9) 

•=0 
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Figure 1: An HMM-based control system 

The codebook is generated by the VQ algorithm. We use the LBG algorithm to produce the 
VQ codebook [20]. The LBG algorithm iteratively splits the training data into 2, 4, ..., 2”* 
partitions with a centroid for each partition. 

For single-input single-output (SISO) systems, we can simply use a scalar quantization tech¬ 
nique to map the feedback signal onto finite symbols. 


3.2 Pattern Evaluation Unit 

Pattern evaluation is the problem of determining the probability that a given HMM A gener¬ 
ates a sequence, O = O 1 O 2 ■ • • Or- The most straightforward way to calculate the probability 
of the observation sequence 0 = O 1 O 2 • ■ ■ Or is through enumerating every possible state 
sequence of length T. For every fixed state sequence S = S 1 S 2 ■ ■ ■ St, based on the assump¬ 
tions we stated, the probability of the observation sequence O, P[0\S, A) can be computed 
as follows; 

P{0\S, A) = 6„((9x)6,,(02) • •. K^{Ot). (10) 

On the other hand, the probability of a state sequence S can also be written as 

■P(‘51A) = (11) 

The joint probability of O and 5, i.e., the probability that O and S occur simultaneously, is 
simply the product of the above two terms. 

P(0,5|A) = P(0|5,A)P(5|A). (12) 

The probability P(0|A) is the summation of this joint probability over all possible state 
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sequences: 


alls 

— 5Z n (13) 

all S t=l 

Note that the computation for the above equation is on the order of 0{N^). We must go 
through N possible states at every time < = 1,2, • • •, T. Even for small value of T and N, this 
computation is expensive. For ex 2 aiiple, if = 5 and T = 100, the required computation 
is on the order of 10*®. To avoid this computation expense, we can use a more efficient 
algorithm known as the Forward-Backward algorithm [12]. 

Forward algorithm 

The forward variable ««(*) is defined as 

at(i) = P(OxO,...Ot, 5, = *1A). (14) 

This is the probability of a partial observation sequence to time t, and state Si which is 
reached at time t, given the model A. This probability can be inductively computed by the 
following steps: 


1. Initialization: 

ai(i) = 7rj6,(Oi) 1 < i < A^. 

(15) 

2. Induction: 

■ N 

«t+i(j) = bj{Ot+i), 

.t=i 



1 <(<r-i, \ <j<N. 

(16) 

3. Termination: 

i>(0|A) = f;aT« 

(17) 


«=i 


With this algorithm, the computation of at{j) is only on the order of 0{N^T). 


Backward algorithm 

In a similar way, the backward variable 0tii) is defined as 

I3t{i) = P(Ot+iOt+2 •' • A). (18) 
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i.e., the probability of a partial observation sequence from t + 1 to the final observation T, 
given state t at time t and the model A. The backward variable is computed in the following 
manner. 


1. Initialization: 

MO = 1, l<i<N. (19) 


2. Induction: 


N 


0t{i) = 

j=l 

< = r-l,T-2,•••,!, l<i<iV. 


( 20 ) 


3. Termination: 


P(0\X) = '£TA{0,)Mi)- 

i=l 


( 21 ) 


The computational complexity of ^t{t) is approximately the same as that of at{i). Both 
the Forward and Backward algorithms can be used for computing P(0|A). However, in the 
problem of recognition, we need to computer P(\\0). Using Bayes’ formula, P(A|0) can be 
obtained by 

( 22 ) 


3.3 Model Bank 


In order to characterize the decision patterns, each pattern is described by an HMM. Since 
the feedback signal is a time sequence, the underlying state sequence associated with the 
model has the property that, as time increases, the state index increases (or stays the same), 
i.e., the states proceed from left to right. Supposing the control actions depend on the most 
recent m feedback samples, we can use an n (n < m) state left-right HMM, or so called 
Beikis model, to describe the pattern as shown in Figure 2. 


The transition matrix in this case is 


A = 


<*11 <*12 <*13 

0 fljj 023 

i 0 •• 


0 

<*24 


0 

0 


<*n-2,n-2 <*n-2,n-l <ln-2,n 

0 <*n—l,n—1 <*n—l,n 

... 0 Onn 


(23) 
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Figure 2: 5-state left-right HMM 


Clearly this model has fewer parameters than that of ergodic, or fully connected HMMs. 
Furthermore, the initial state probabilities have the property 




(24) 


Moreover, the state transition coefficients of state n are specified as 


^nn — 1» 

On.i = 0, i < n. 


(25) 


If we convert the feedback signals into p symbols by certain signal processing techniques, B 
matrix is an n x p matrix. 

Learning is achieved by adjusting the HMM parameters {A, B, tt) to maximize the probability 
of the observation sequence. If the model parameters are known, we can compute the 
probabilities of an observation produced by given model peirameters aind then update the 
model parameters based on the current probabilities, if the model parameters are unknown, 
however, no analytic method is available for updating the model pjiraineters. 

An iterative algorithm is used to update the model peurameters. For any model A with non¬ 
zero parameters, we first define the posterior probability of transitions 7 ,j, from state * to 
state j, given the model and the observation sequence, 

= P{St = i, St+i = j\0,\) 

P{0\\) ' ’ 

Similarly, the posterior probability of being in state i at time f, 7t(i), given the observation 
sequence and model, is defined as 

7t(0 = 


F(5,=t|0,A) 

Mmi) 


( 27 ) 






T-l 

The expression 7<(0 can be interpreted as the expected (over time) number of times that 

t=i 

state Si is visited, or, the expected number of transitions made from state 5, if time slot 
^ = T is excluded from the summation. Similarly, the summation of 7 <(i,i) from < = 1 to 
t = T—\ can be interpreted as the expected number of transitions from state Si to state Sj. 

Using the above formul 2 is and the concept to count event occurrences, a new model A = 
{A,B,n) can be created to improve the old model A = {A,B,ir). A set of reasonable 
reestimation formulas for tt. A, and B is: 


TT = IfU 

(28) 

T-l 


'ytiij) 


rr t=l 

- T-l ’ 

(29) 




t=i j 


Z) 7«(i) 

bjik) = -, (30) 

]C7«(i) 

J = 1,2,...7V, 
fc = l,2,...M, 

where Vk is the observation symbol. 

Equations (28) to (30) are extensions of the Baum-Welch reestimation algorithm [22]. The 
Baum-Welch algorithm gives the maximum likelihood estimate of the HMM and can be used 
to obtain the model which describes the most likely performance for a given pattern. It has 
proved that either the initial model A defines a critical point of the likelihood function, where 
new estimates equal old ones, or are more likely to produce the given observation sequence, 
i.e., the model A is more likely than the model A in the sense that P{0\X) > P(C>|A). 

By repeating the above reestimation and replacing A with A, we ensure that P(0|A) can be 
improved until a limiting point is reached. 


3.4 Control Signal Generator 

Equation (22) provides a way to evaluate how well the feedback signal matches the HMMs 
in the model bank. Because P{0) is a constant for a given input, only P(0|A)P(A) needs 
computing. P(A) is the probability that the control pattern is used. This problem can be 
solved with a syntactic method [21]. 

Syntactic methods assume that patterns are composed of subpatterns in the same way that 
phrases and sentences are built by concatenating words and words are built by concatenating 
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characters. The structure information of the patterns is provided by the so-caJled “pattern 
description language.” The “grammar” of the pattern description language governs the 
composition of subpatterns into patterns. Once each subpatterns within the pattern has 
been recognized, the recognition process is accomplished by performing a syntax analysis. 
The syntax analysis generates a structure description of the sentence representing the given 
pattern. Based on the syntactic approach, P(A) is determined by the “grammar.’ The 
grammar determines the probability with which one control action is followed by another 
action. This structure information can introduce control history into current control output 
to avoid unexpected control output. P(A) can be learned from the training data or assigned 
by experience. In the simplest case, if all the control patterns are equally likely to be used, 
only the term P{0\X) is a variable. In this case, Equation (8) can be rewritten as: 


j=i _ 

m 

j=i 


(31) 


Since only a single value u{j), j = 1,2,...,m, is used for each pattern, there is a need to 
compute this single value. One way to do this is to simply take the center point of the 
pattern in the control space. An alternative method is to use mean value of all the training 
data belonged to the pattern. We have tried both methods and found that both of them 
were effective. 


4 Case Studies 

To evaluate the validity and effectiveness of the proposed method, we have carried out two 
case studies, we first examined HMM-based controller for a sun-seeker control system. The 
system is mounted on a space vehicle to track the sun. We then applied the HMM learning 
controller to a benchmark problem in intelligent control - inverted pendulum balancing. The 
problem is challenging because of its nonlinear unstable behavior. 

Conceptually, if we can obtain good results with these systems, we have sufficient reasons to 
expect the HMM controller to work well for other systems, because the only difference lies in 
the data measurement and processing. We performed simulations to examine the feasibility 
of the method for handling well-defined problems and to reveal various important issues such 
as parameterization and stability. 


4.1 Linear System 


The system is shown in Figure 3, and the transfer function of the plant is: 


Gr,{s) 


2500 

s(s-l-25)’ 


(32) 
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Figure 3; Block diagram of the sun-seeker control system 

This is a second order linear system. To obtain the training data, we employed a pi 
controller as the “teacher.” The controller transfer function is: 

1 +0.0342s 

1-h 0.00588s ■ 

The noise n(t) is the Gaussian white noise with zero metin ^lnd variance 0.01, i.e., 

B{n(t)} = 0, E{n(t)n(tf} = 0.01. 

The input/output data of the controller were recorded while the phase-lead controller con¬ 
trolled the system. We assumed that the decision patterns eire related to ten consistent 
samples of error signals. This means that ten samples of error signals are used for decision 
making. Because this is an SISO system, we can use a scalar quamtization technique to con¬ 
vert the error signal into finite symbols. The feedback signal was quantized into 256 levels 
and the controller output was based on 26 decision patterns. Therefore, there is a total of 
26 HMMs in the model bank and 256 symbols in the output probability distribution func¬ 
tions of each discrete HMM. Each input sampling corresponded to one of 256 symbols 2 md 
each of 10 symbols is associated with an HMM. To describe the patterns, we used five-state 
left-right HMMs. Let n = 5 we can obtain the form of transition matrix A, the initial state 
probabilities, and the state transition coefficients of state 5 from equations (24) to (25). The 
observability matrix B is a 256 x 5 matrix where each column represents the observation 
probability distribution for one state. 

The training data were obtained by providing a unit square wave with a changeable width 
as reference input to the system (see Figure 3). After collecting 9000 samples of controller 
input/output data, we partitioned each control command into one of the 26 patterns and 
converted each feedback sampling into one of 256 symbols. Then we handmarked each of 26 
patterns with corresponding feedback symbols for training. To initialize the model parame¬ 
ters, we set output probabilities to jjg, where 256 is the quantization level. The transition 
probabilities were initialized by the uniformly distributed random number. With these ini¬ 
tial parameters, the Forward-Backward algorithm was run recursively on the training data. 


se-lead 

(33) 

(34) 
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Figure 4: Step responses of the system 

The Baum-Welch algorithm was used iteratively to reestimate the parameters based on the 
forward and backward variables. After each iteration, the output probability distributions 
were smoothed using a floor between 0.0001 and 0.00001 and renormalized to meet stochastic 
constraints. Twelve iterations were run for the training processes. 

The trained HMMs were then put into the model bank and the HMM-based controller wzis 
employed to control the system. Figure 4 shows a comparison of step responses between 
the phase-lead controlled system and the HMM-based control system. We notice that the 
overshoot of the HMM-based system is less than that of the phase-lead controller system, 
but the steady state error of the HMM-based system is larger. Figure 5 shows tracking 
performance of the HMM-based controller. The HMM-based control system trax:ks a square 
wave reference input well. In all simulations, we added the Gaussian white noise with zero 
mean and variance 0.01. These results demonstrate that HMMs are able to 1620*0 a control 
strategy and to control systems. 

In the HMM-based controller, the probability of an HMM matching with the feedback signal 
pattern determines the contribution of the HMM to controller output, i.e., the probability is 
a weight. However, there is no criterion for selecting m in Equation(8). The relation between 
m and summation of square error for the system discussed in this report is shown in Figure 6. 
The summation of square error is a minimum when m = 8 and remains unchanged when 
m > 13. This implies that for computing control outputs, a large number of the patterns 
does not guarantee better performance. 


4.2 Inverted Pendulum 

The inverted pendulum system shown in Figure 7 consists of an inverted pendulum of length 
L and mass m and a cart of m 2 iss M. The pivot of the pendulum is mounted on a cart 
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Figure 7: Inverted pendulum system 


which can move in a horizontal direction. The cart is controlled by the horizontal force u, 
the input to the system. 

We assume that the pendulum is a narrow, uniform rod and there is no friction at the pivot 
and no actuator dynamics. The inverted pendulum system was simulated using the following 
equations of motion: 

0 = ^{9 sin ^ — A* cos 9), (35) 

m{L sin 90^-lgsm29)-fX+ u 
M + m(l + fcos 2 0 ) ’ 

where M = \ {kg), m = 0.1 {kg), L = 1 (m), / = 5 {kg/s), and g = 9.81 {m/s^). 

This system was simulated numerically using Euler’s approximation method, + 1] = 
+r5[fc], with a time step T = 0.02 second. The sampling rate of the inverted pendulum’s 
states and the rate at which control forces are applied are the same as the simulation rate, 
i.e., 50 Hz. Our control goal here was to balance the pendulum by applying a sequence of 
right and left forces regardless of the cart’s position and velocity. We assumed the pendulum 
angle 9 to be measurable. To obtain the training data, we employed nonlinear control law 
|23|; 


3 

hi = —gsin9, 

(37) 

3 

*2 = — COS 5, 

(38) 

{Lsin99^ — sin 20) — fX, 

8 

(39) 

= M + m(l — - cos^ 0), 

4 

(40) 
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u = ^ [fci + ki{6 — 9d + — fi, (41) 

where ki = 25, fcj = 10. 

In order to obtain training data, 80 iterations of simulation were conducted with the above 
control law and initial state [X,X,6,6]'^ = [1.0,0.0,5.0,0.0]^. The noise with zero mean and 
0.01 variance w<is added to initial value of 6 to avoid the same data for all iterations. The 
values of 9 and u were recorded for training the HMM. The feedback signal 9 was quantized 
into 256 levels and partitioned into 40 decision patterns. Therefore, a total of 40 HMMs, 
each of which contains 6 states, was employed to encode decision patterns. Each HMM was 
trained by corresponding data. Eight sequential samples of feedback signal were evaluated 
each time to determine control patterns. Figure 8 compares the response of the HMM-based 
control system and the ideal response of the nonlinear control system. 


5 Conclusion 

In many real world practices, the control strategy cannot be explicitly expressed as a function, 
but can be given by examples. Learning from examples is a feasible way to develop a 
controller automatically. If the control strategy has uncertainties, the learning method must 
have the medianism to cope with stochastic nature. HMM is a powerful paraimetric model 
and is feasible to characterize a doubly stochastic process. This report presents a learning 
controller based on HMM to characterize and learn the decision patterns. We have developed 
a technique to build an HMM-based learning controller. We demonstrated the feasibility of 
the proposed method by simulation studies. We investigated two cases: a linear system and 
an inverted pendulum system. 

Although we have examined only cases of well known controllers as teachers, the teacher can 
be any type controller, including control strategies used by a human. For a MIMO system, 
it is possible to use a multi-dimensional HMM to encode the control strategy. A multi¬ 
dimensional HMM is an HMM which has more than one observable symbol at each time t 
and is capable of desding with multiple feedback signals. A multi-dimensional HMM is also 
appropriate for fusing different sensory signals with different physical meanings. However, 
to use HMMs for an MIMO system, more efficient method must be developed to generate 
multiple control outputs. A possible way is to use the VQ technique for partitioning the 
control space and then to use the same method as an SISO system for generating each 
component of control vector. The future research on HMM-based strategy modeling lies 
in both theory and practice. Because a HMM-based controller is nonlinear, it is difficult to 
analyze the HMM-based system. A theory is needed to analyze the stability and performance 
of the system. In practice, more realistic problems should be investigated which leads to the 
development of new methods for designing HMM-b 2 wed controller. One short term extension 
is to develop a systematic procedure for optimally partitioning control space. 
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Figure 8: Responses of HMM-based system and the teacher 
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